PhaseLift: Exact and Stable Signal Recovery from Magnitude 
Measurements via Convex Programming 

Emmanuel J. Candes* Thomas Strohmer^^ and Vladislav Voroninski* 



Suppose we wish to recover a signal x G C" from m intensity measurements of the form 
\{x,Zi)\'^, i ~ 1,2, ... ,m; that is, from data in which phase information is missing. We prove 
that if the vectors Zi are sampled independently and uniformly at random on the unit sphere, 
then the signal x can be recovered exactly (up to a global phase factor) by solving a convenient 
semidefinite program a trace-norm minimization problem; this holds with large probability 
provided that m is on the order of nlogn, and without any assumption about the signal what- 
soever. This novel result demonstrates that in some instances, the combinatorial phase retrieval 
problem can be solved by convex programming techniques. Finally, we also prove that our 
methodology is robust vis a vis additive noise. 

1 Introduction 

In many applications, one would like to acquire information about an object but it is impossible or 
very difficult to measure and record the phase of the signal. The problem is then to reconstruct the 
object from intensity measurements only. A problem of this kind that has attracted a considerable 
amount of attention over the last hundred years or so, is of course that of recovering a signal or 
image from the intensity measurements of its Fourier transform [15, 16] as in X-ray crystallography. 
As is well-known, such phase retrieval problems are notoriously difficult to solve numerically. 

Formally, suppose x G is a discrete signal and that we are given information about the squared 
modulus of the inner product between the signal and some vectors Zi, namely, 



In truth, we would like to know {x,Zi) and record both phase and magnitude information but 
can only record the magnitude; in other words, phase information is lost. In the classical example 
discussed above, the Zj's are complex exponentials at frequency oji so that one collects the squared 
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modulus of the Fourier transform of a;. Of course, many other choices for the measurement vectors 
Zi are frequently discussed in the literature, see [2, 12] for instance. 

We wish to recover x from the data vector 6, and suppose first that x is known to be real valued a 
priori. Then assuming that x is uniquely determined by 6 up to a global sign, the recovery may be 
cast as a combinatorial optimization problem: find a set of signs ai such that the solution to the 
linear equations {x^Zi) = Gi\fbi^ call it x^ obeys | p = hi. Clearly, there are 2"* choices for 

(Tj and only two choices of these signs yield x up to global phase. The complex case is harder yet, 
since resolving the phase ambiguities now consists of finding a collection Oi of complex numbers, 
each being on the unit circle. Formalizing matters, it has been shown that at least one version of 
the phase retrieval problem is NP-hard [20]. Thus, one of the major challenges in the field is to 
find conditions on m and Zi which guarantee efficient numerical recovery. 

A frame-theoretic approach to signal recovery from magnitude measurements has been proposed 
in [1-3], where the authors derive various necessary and sufficient conditions for the uniqueness of 
the solution, as well as various polynomial-time numerical algorithms for very specific choices of Zj. 
While theoretically quite appealing, the drawbacks are that the methods are (1) either algebraic 
in nature, thus severely limiting their stability in the presence of noise or slightly inexact data, or 
(2) the number m of measurements is on the order of n^, which is much too large compared to the 
number of unknowns. 

This paper follows a very different route and establishes that if the vectors Zj are independently 
and uniformly sampled on the unit sphere, then our signal can be recovered exactly from the 
magnitude measurements (1.1) by solving a simple convex program we introduce below; this holds 
with high probability with the proviso that the number of measurements is on the order of n log n. 
Since there are n complex unknowns, we see that the number of samples is nearly minimal. To 
the best of our knowledge, this is the first result establishing that under appropriate conditions, 
the computationally challenging nonconvex problem of reconstructing a signal from magnitude 
measurements is formally equivalent to a convex program in the sense that they are guaranteed to 
have the same unique solution. 

Finally, our methodology is robust with respect to noise in the measurements. To be sure, when 
the data are corrupted by a small amount of noise, we also prove that the recovery error is small. 



1.1 Methodology 

We introduce some notation that shall be used throughout to explain our methodology. Letting A 
be the linear transformation 

which maps Hermitian matrices into real-valued vectors, one can express the data collection hi = 
\{x,Zi)\'^ as 

b = A{xx*). (1.3) 

For reference, the adjoint operator A* maps real-valued inputs into Hermitian matrices, and is 
given by 

y ^ Y^iViZiZ*. 
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As observed in [7,10] (see also [17]), the phase retrieval problem can be cast as the matrix recovery 
problem 

minimize rank(A^) 

subject to A{X) = b (1.4) 

Indeed, we know that a rank-one solution exists so the optimal X has rank at most one. We then 
factorizc the solution as xx* in order to obtain solutions to the phase-retrieval problem. This gives 
X up to multiplication by a unit-normed scalar. This is all we can hope for since if a; is a solution 
to the phase retrieval problem, then cx for any scalar c G C obeying jcj = 1 is also solution.^ 

Rank minimization is in general NP hard, and we propose, instead, solving a trace-norm relaxation. 
Although this is a fairly standard relaxation in control [4, 18], the idea of casting the phase retrieval 
problem as a trace-minimization problem over an afhne slice of the positive semidefinite cone is 
very recent [7,10]. Formally, we suggest solving 

minimize Tr{X) 

subject to A{X) = b (1.5) 
X ^ 0. 

If the solution has rank one, we factorize it as above to recover our signal. This method which lifts up 
the problem of vector recovery from quadratic constraints into that of recovering a rank-one matrix 
from affine constraints via semidefinite programming is known under the name of PhaseLift [7]. 

The program (1.5) is a semidefinite program (SDP) in standard form, and there is a rapidly growing 

list of algorithms for solving problems of this kind as efficiently as possible. The crucial question 
is whether and under which conditions the combinatorially hard problem (1.4) and the convex 
problem (1.5) are formally equivalent. 

1.2 Main result 

In this paper, we consider the simplest and perhaps most natural model of measurement vectors. 
In this statistical model, we simply assume that the vectors Zi are independently and uniformly 
distributed on the unit sphere of C" or M". To be concrete, we distinguish two models. 

• The real-valued model. Here, the unknown signal x is real valued and the Zj's are indepen- 
dently sampled on the unit sphere of M". 

• The complex-valued model. The signal x is now complex valued and the Zi's are independently 
sampled on the unit sphere of C". 

Our main result is that the convex program recovers x exactly (up to global phase) provided the 
number m of magnitude measurements is on the order of n log n. 

Theorem 1.1 Consider an arbitrary signal x in M" or C" and suppose that the number of mea- 
surements obeys m > conlogn, where cq is a sufficiently large constant. Then in both the real and 

^When the solution is unique up to multiplication by such a scalar, we shall say that unicity holds up to global 
phase. 
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complex cases, the solution to the trace-minimization program is exact with high probability in the 
sense that (1.5) has a unique solution obeying 

X = XX*. (1.6) 

This holds with probability at least 1 — Se"'''^, where j is a positive absolute constant. 

Expressed differently, Theorem 1.1 establishes a rigorous equivalence between a class of phase 
retrieval problems and a class of semidefinite programs. Clearly, any phase retrieval algorithm, no 
matter how complicated or intractable, would need at least 2n quadratic measurements to recover 
a complex valued object a; G C". In fact recent results, compare Theorem II in [12], show that for 
complex- valued signals, one needs at least 3n — 2 intensity measurements to guarantee uniqueness 
of the solution to (1.4). Further, Balan, Casazza and Edidin have shown that with probability 1, 
4n — 2 generic measurement vectors (which includes the case of random uniform vectors) suffice 
for uniqueness in the complex case [3]. Hence, Theorem 1.1 shows that the oversampling factor for 
perfect recovery via convex optimization is rather minimal. 

To be absolutely complete, we would like to emphasize that our discrete signals x may represent 
ID, 2D, 3D and higher dimensional objects. For instance, in 2D the vector x G C"^ might be a 
family of samples of the form x[ti,t2\, 1 < ii < rai, 1 <t2 < n2, and with n = nin2, so that a; is a 
discrete 2D image. In this case, we would record the squared magnitudes of the dot product 

{x,Zi) = X[tl,t2]zi[tl,t2]. 
tl,t2 

Hence, our framework and theory apply to one- or multi-dimensional signals. 



1.3 Geometry 

We find it rather remarkable that the only solution to (1.5) is X = xx* . To see why this is perhaps 
unexpected, suppose for simplicity that the trace of the solution were known (we might be given 
some side information or just have additional measurements giving us this information) and equal 
to 1, say. In this case, the objective functional is of course constant over the feasible set, and our 
problem reduces to solving the feasibility problem 

find X 

such that A{X) = b,XyO ^ ' 

with again the proviso that knowledge of A{X) determines Tr{X) (equal to Tr(a;a;*) = ||a;||2 = 1). 
In this context, our main theorem states that xx* is the unique feasible point. In other words, 
there is no other positive semidefinite matrix X in the affine space A{X) = b. Naively, we 
would not expect this affine space of enormous dimension — it is of co-dimension about 77,logn 
and thus of dimension — 0(n log n) in the complex case — to intersect the positive semidefinite 
cone in only one point. Indeed, counting degrees of freedom suggests that there are infinitely 
many candidates in the intersection. The reason why this is not the case, however, is precisely 
because there is a feasible solution with low rank. Indeed, the slice of the positive semidefinite cone 
{X : X ^ 0} n {Tr(X) = 1} is quite 'pointy' at xx* and it is, therefore, possible for the affine 
space {A{X) = b} to be tangent even though it is of very small codimension. 
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Figure 1: Representation of the affine space A{X) = b (gray) and of the scmidcfinite cone 
>: (red) which is a subset of . These two sets are drawn so that they are tangent 
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Figure 1 represents this geometry. In this example, 
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and the affine space A{X) = 6 is tangent to the positive semidefinite cone at the point xx* . 

Mathematically speaking, phase retrieval is a problem in algebraic geometry since we are trying to 
find a solution to a set of polynomial equations. The originality in our approach is that we do not 
use tools from this field. For instance, we prove that there is no other positive semidefinite matrix 
X in the affine space ^(-X^) = b, or equivalently, that a certain system of polynomial equations (a 
symmetric matrix is positive semidefinite if and only if the determinants of all the leading principal 
minors are nonnegative) only has one solution; this is a fact that general techniques from algebraic 
geometry appear to not detect. 



1.4 Stability 

In the real world, measurements are contaminated by noise. Using the frameworks developed in [8] 
and [14], it is possible to extend Theorem 1.1 to accommodate noisy measurements. One could 
consider a variety of noise models as discussed in [7] but we work here with a simple generic model 
in which we observe 

bi = \{x,Zi)\^ + iyi, (1.8) 

where Ui is a noise term with bounded £2 norm, 111^112 < £• This model is nonstandard since the 
usual statistical linear model posits a relationship of the form bi = {x, Zi) + Vi in which the mean 
response is a linear function of the unknown signal, not a quadratic function. Furthermore, we 
prefer studying (1.8) rather than the related model hi = \ {x^ Zj)| + (the modulus is not squared) 
because in many applications of interest in optics and other areas of physics, one can measure 
squared magnitudes or intensities — not magnitudes. 
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We now consider the solution to 

minimize Tt(X) 

subject to \\A{X) - b\\2 < e (1.9) 

X yo. 

We do not claim that X has low rank so we suggest estimating x by extracting the largest rank-1 
component. Write X as 

n 

X = ^ XkUkul, Ai > . . . > A„ > 0, 

k=l 

and set 

We prove the following estimate. 



Theorem 1.2 Fix x G C" orM" and assum,e the Zi 's are uniformly sampled on the sphere of radius 
\fn. Under the hypotheses of Theorem 1.1, the solution to (1.9) obeys (\\X\\2 is the Frohenius norm 
ofX) 

\\X -XX*\\2<Cq€ (1.10) 

for some positive numerical constant Co . We also have 

\\x-e''t'x\\2 < Comin(||a;||2,e/||a;||2) (1.11) 

for some (f) G [0,27r]. Both these estimates hold with nearly the same probability as in the noiseless 
case. 



Thus our approach also provides stable recovery in presence of noise. This important property 
is not shared by other reconstruction methods, which are of a more algebraic nature and rely on 
particular properties of the measurement vectors, such as the methods in [2,3,12], as well as the 
methods that appear implicitly in Theorem 3.1 and Theorem 3.3 of [7]. 

We note that one can further improve the accuracy of the solution x by "debiasing" it. We replace 
X by its rescaled version sx where s = \jYlk=i ^fc/ll^lb- This corrects for the energy leakage 
occurring when X is not exactly a rank-1 solution, which could cause the norm of x to be smaller 
than that of the actual solution. Other corrections are of course possible. 



1.5 Organization of the paper 



The remainder of the paper is organized as follows. Subsection 1.6 introduces some notation used 
throughout the paper. In Section 2 we present the main architecture of the proof of Theorem 1.1, 
which comprises two key ingredients: approximate ii isometrics and approximate dual certificates. 
Section 3 is devoted to establishing approximate ii isometries. In Section 4, we construct approx- 
imate dual certificates and complete the proof of Theorem 1.1 in the real- valued case. Section 5 
shows how the proof for the real-valued case can be adapted to the complex-valued case. Section 6 
is concerned with the proof of Theorem 1.2. Numerical simulations, illustrating our theoretical 
results, are presented in Section 7. We conclude the paper with a short discussion in Section 8. 
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1.6 Notations 



It is useful to introduce notations that shall be used throughout the paper. Matrices and vectors 
are denoted in boldface (such as X or a;) , while individual entries of a vector or matrix are denoted 
in normal font; e.g. the ith entry of x is Xj. For matrices, we define 

ii^iip = [E^rw]'^'' 

i 

(where ai{X) denotes the zth singular value of X), so that ||X||i is the nuclear norm, 
Probenius norm and ||X||oo is the operator norm also denoted by \\X\\. For vectors, 
usual ip norm. We denote the n — 1 dimensional sphere by S"'~^, i.e. the set {x G 

Next, we define Tx to be the set of symmetric matrices of the form 

Tx = {X = xy* + yx*:ye M"} (1.12) 

and denote by its orthogonal complement. Note that X G if and only if both the column 
and row spaces of X are perpendicular to x. Further, the operator Vto. is the orthogonal projector 
onto Tx and similarly for Vt^- We shall almost always use Xt^ as a shorthand for Vxa-iX). 

Finally, we will abuse language and say that a symmetric matrix H is feasible if and only if xx*+H 
is feasible for our problem (1.5). This means that H obeys 

xx* + HyO and AiH) = 0. (1.13) 



X\\2 is the 
|a;||p is the 

l|a;||2 = i}. 



2 Architecture of the Proof 

In this section, wc introduce the main architecture of the argument and defer the proofs of crucial 
intermediate results to later sections. We shall prove Theorem 1.1 in the real case first for ease of 
exposition. Then in Section 5, we shall explain how to modify the argument to the complex and 
more general case. 

Suppose then that £c G M" and that the Zj's are sampled on the unit sphere. It is clear that we may 

assume without loss of generality that x is unit-normed. Further, since the uniform distribution on 
the unit sphere is rotationally invariant, it suffices to prove the theorem in the case where x = ei. 
Indeed, we can write any unit vector x ss x = Uei where U is orthogonal. Since 

\{x,Zi)\^ = \{Uei,Zi)\^ = |(ei,C/*Zi)|2 \{e,,Zi)f, 

the problem is the same as that of finding ei. We henceforth assume that x = ei. 

Finally, the theorem can be cquivalently stated in the case where the Zj's are i.i.d. copies of a 
white noise vector z ~ Af{0,I) with independent standard normals as components. Indeed, if 
Zi^M{0,I), 

\{x,Zi)f = bi ^ \{x,Ui)f = bi/\\zi\\l, 

where Ui = Zi/\\zi\\2 is uniformly sampled on the unit sphere. Since ||zi||2 does not vanish with 
probability one, establishing the theorem for Gaussian vectors establishes it for uniformly sampled 
vectors and vice versa. From now on, we assume Zj i.i.d. Af{0,I). 
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2.1 Key lemma 

The set T := Tgj defined in (1-12) may be interpreted as the tangent space at eie^ to the manifold 
of symmetric matrices of rank 1. Now standard duaUty arguments in semidefinite programming 
show that a sufficient (and nearly necessary) condition for xx* to be the unique solution to (1.5) 
is this: 

• the restriction of ^ to T is injective {X eT and A{X) = ^ X = 0), 

• and there exists a dual certificate Y in the range of A* obeying^ 

Yt = eiel and Y^ ~< I^. (2.1) 

The proof is straightforward and omitted. Our strategy to prove Theorem 1.1 hinges on the fact 
that a strengthening of the injcctivity property allows to relax the properties of the dual certificate, 
as in the approach pioneered in [13] for matrix completion. We establish the crucial lemma below. 

Lemma 2.1 Suppose that the mapping A obeys the following two properties: for all positive 
semidefinite matrices X, 

m-i||.4(X)||i<(l + l/9)||X||i; (2.2) 

and for all matrices X eT 

m-^||^(X)||i > 0.94(1 - 1/9)||X||. (2.3) 
Suppose further that there exists Y in the range of A* obeying 

\\Yt - eieiy < 1/3 and < 1/2. (2.4) 

Then eie\ is the unique minimizer to (1.5). 

The first property (2.2) is reminiscent of the (one-sided) RIP property in the area of compressed 
sensing [9]. The difference is that it is expressed in the 1-norm rather than the 2-norm. Having 
said this, we note that RIP-1 properties have also been used in the compressed sensing literature, 
see [6] for example. We use this property instead of a property about ||.A(X) ||2, because we actually 
believe that a RIP property in the 2-norm does not hold here because ||^(X)||2 involves fourth 
moments of Gaussian variables. The second property (2.3) is a form of local RIP-1 since it holds 
only for matrices in T. 

We would like to emphasize that the bound for the dual certificate in (2.4) is loose in the sense 
that Yt and eie^ may not be that close, a fact which will play a crucial role in our proof. This is 
in stark contrast with the work of David Gross [13], which requires a very tight approximation. 

■^The notation A ^ B means that B — A is positive definite. 
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2.2 Proof of Lemma 2.1 



We need to show that there is no feasible xx* -\- H ^ xx* with Ti:{xx* + H) < Ti{xx*). Consider 
then a feasible H ^ obeying Tr(H') < 0, write 

H = Hj^ + -f^T y 

and observe that 

= \\A{H)h = \\A{Ht)\\i - \\A{H^)h. (2.5) 
Now it is clear that xx* + H y ^ y and, therefore, (2.2) gives 

m-^\\A{H^)\\i < {l + 6)Tr{H^) 

for some 6 < 1/9. Also, Tj:{Ht) < - Tv{H^) < 0, which implies that \Tj:{Ht)\ > Tr(iJ^). We 
then show that the operator and Probenius norms of Ht must nearly be the same. 

Lemma 2.2 Any feasible matrix H such that Tr{H) < must obey 

II J - y II II 

Proof Since the matrix Ht has rank at most 2 and cannot be negative definite, it is of the form 

— X{uiUl — tU2U2), 

where Ui and U2 are orthonormal eigenvectors, A > and t G [0, 1]. We claim that we cannot have 
t> 1/4.^ Suppose the contrary and fix t > 1/4. By (2.3), we know that 

\\A{Ht)\\i > 0.94(1 - 6)\\Ht\\. 

Further, since 



for t > 1/4, it holds that 



> 5(1 - S) lTV(ifT)| - (1 + S)T:r(Hi:). 



The right-hand side above is positive if Tr{H^) < fpqjy |Tr(-f'^r)|, so that we may assume that 

^^''T)>l^^^\Tr{HT)\. 



Since, \Tr{HT)\ > Tr(H"^), this gives 

-5 



> 



^(l-5)-(l + <5) Tt{H^). 



The choice of 1/4 is somewhat arbitrary here. 
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If (5 < 1/9, the only way this can happen is if Tr{H^) = ^ H;^ = 0. So we would have H = Ht 
of rank 2 and A{Ht) = 0. Clearly, (2.3) implies that = 0. 

Now that it is established that t < 1/4, the chain of inequalities follow from the relation between 
the eigenvalues of Ht- ■ 

To conclude the proof of Lemma 2.1, we show that the existence of an inexact dual certificate rules 
out the existence of matrices obeying the conditions of Lemma 2.2. Prom 

0.94(1 -<5) II J^T II < \\A{HT)h = \\AH^)\\i < il + S)TriH^), 

we conclude that 

TV(JT^) > 0.94^^ \\Ht\\ > 0m\^M\\Ht\\2, (2.6) 

where we used Lemma 2.2. Next, 

> Ti{Ht) + IV(if^) = {H, eiel) + Tr(H'^) 

= {H, eie\ -Y) + {H, Y) + Tt{H^) 

= {Ht, eiel - Yt) - {H^,Y^) + Tr(l/^) 

>^TV(if^)-^||ifr||2. 

The third line above follows from (if, Y) = and the fourth from Cauchy-Schwarz together with 
|(if^,l"^)| < iTr(if^). Hence, it follows from (2.6) that 



0> ^(0.94^J^-^]||i/T||2. 



'16 2^ 
+ (5 V 17 ~ 3> 



Since the numerical factor is positive for 8 < 0.155, the only way this can happen is if Ht = 0. In 
turn, \\A{H;^)\\i = > {1 - 6) Ti{H^) which gives i/^ = 0. This concludes the proof. 

3 Approximate ii Isometries 

We have seen that in order to prove our main result, it suffices to show 1) that the measurement 
operator A enjoys approximate isometry properties (in an ii sense) when acting on low-rank matri- 
ces and 2) that an inexact dual certificate exists. This section focuses on the former and establishes 
that both (2.2) and (2.3) hold with high probability. In fact, we shall prove stronger results than 
what is strictly required. 

Lemma 3.1 Fix any 6 > and assume m > 1&5~^ n. Then for all unit vectors u, 

{l-5)<^\\A{uu*)\\i<{l + 5) (3.1) 

on an event Eg of probability at least 1 — 2e~"*^^/^, where S/4 = + e. On the same event, 

{l-6)\\Xh<-\\A{X)h<{l + 6)\\Xh 
m 

for all positive semidefinite matrices. The right inequality holds for all matrices. 
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Proof This lemma has an easy proof. Let Z be the m x n matrix with Zj's as rows. Then 

\\A{uu*)\\r = Y,\{zuu)f = \\Zuf 

i 

so that 

al^^{Z)<\\A{uu*)\\i<al^{Z). 

The claim is a consequence of well-known deviations bounds concerning the singular values of 
Gaussian random matrices [21], namely, 

P ((Tma^(^) > yf^+y/^ + t) < e-*'/^ 

P (o-mm(^) <Vrn-Vn-t) < e"*"/^. 

The conclusion follows from taking m > n and t = ^/me. For the second part of the lemma, 
observe that X = XjUjUj with nonnegative eigenvalues Xj so that 

\\AiX)\\, = ^^A,|K-,z,)p = ^A,PK-«*)||i. 



|2 



The claim follows from (3.1). The last claim is a consequence of ||.4(X)||i < Ylj J2i I -^j 1 1 ('"j > 
together with J2j |Aj| = ■ 

Our next result is concerned with the mapping of rank-2 matrices. 

Lemma 3.2 Fix 5 > 0. Then there are positive numerical constants cq and 70 such that if m > 
Co [S'~^\og6~^n, A obeys the following property with probability at least 1 — Se~'^°'^^ : for any 
symmetric rank-2 matrix X, 

iM(X)||i> 0.94(1 (3.2) 

Proof By homogeneity, it suffices to consider the case where \\X\\ = 1. Consider then a rank-2 
matrix X with eigenvalue decomposition X = uiu^ — tU2U2 with t e [—1,1] and orthonormal Wj's. 
Note that for t < 0, Lemma 3.1 already claims a tighter lower bound so it only suffices to consider 
t € [0,1]. We have 

1 1 I 1 

-\\AiX)\U = -Y,\\{uuZ^)\'-t\{u,,Z,)f =-E^- 



m m 

i=l 



where the ^j's are independent copies of the random variable 

^ = \zf-tzl\ 

in which Zi and Z2 arc independent standard normal variables. This eomes from the fact that 
{ui,Zi) and {u2,Zi) are independent standard normal. We calculate below that 

E^ = f{t) = ;^(^2Vt+(l-t)(7r/2-2arctan(Vi))). (3.3) 
The graph of this function is shown in Figure 2; we check that f{t) > 0.94 for all t G [0, 1]. 

We now need a deviation bound concerning the fluctuation of J2i around its mean and this 
is achieved by classical Chernoff bounds. Note that ^ < Zf + \t\Z2 is a sub-exponential variable 
and thus, \\^\\^^ := supp>i [E {^{p^^p is finite.^ 



It would be possible to compute a bound on this quantity but we will not pursue this at the moment. 
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Figure 2: f{t) = E - tZ^\ as a function of t. 



Lemma 3.3 (Bernstein-type inequality [21]) Let Xi, . . . ,Xm be i.i.d. sub- exponential random 
variables. Then 



(l-E 



Xi-¥.Xi 



> e < 2 exp 



-Co mmm 



1=1 



\^\\%^ ' ll^llvi 



in which cq is a positive numerical constant. 

We have thus estabhshed that for a fixed X, 

m-^\\A{X)\\i > (0.94-eo)||X|| 

2 

with probabihty at least 1 — 26^'''°™'^" (provided eo < ||'^||i/)u which we assume). 

To complete the argument, let be an e net of the unit sphere, Te be an e net of [0, 1], and set 

A4 = {X = Uiul —tU2U*2 : {ui,U2,t) G 5e X 5e X Te}. 

Since < (3/e)", we have 

|A4| < (3/e)2"+i. 

Now for any X = uu* — tvv* , consider the approximation Xq = uqUq — tQV^VQ G Me, where 
||''^o ~ ■^Ibi \\v — vo\\2 and |t — to| Eire each at most e. We claim that 



IX -X, 



Olll 



< 9e, 



(3.4) 



and postpone the short proof. On the intersection of i?i = {m ^\\A{X)\\i < (l+5i)||X||i, for all X} 
with E2 := {m-ip(Xo)||i > (0.94 - e)||Xo||, for all Xq G A4}, 

m-'\\A{X)\U>\\A{Xo)h-\\A{X-Xo)h 

> (0.94-e)||Xo|| -9(l + 5i)e 

> (0.94 - e)(||X|| - ||Xo - X||) - 9(1 + 6i)e 

> (0.94 - e)(l - 5e) - 9(1 + 6i)e 

> 0.94- (15 + 95i)e, 
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which is the desired bound by setting 0.945 = (15 + 9(5i)e. In conclusion, set Si = 1/2 and 

2 

take e = 0.945/20. Then Ei holds with probability at least 1 — 0{e^'^^"^'' ) provided m obeys 
the condition of the theorem. Further, Lemma 3.2 states that E2 holds with probability at least 
1 — 2e~'^'^'^. This concludes the proof provided we check (3.4). 

We begin with 



Now 



1-^ - ^o||i < \\uu* - itoWolli + I* - ^olll'u^^*!!! + l^olll'u^^* - ■wo'^olli- 



\UU* — UqUqWi < 2\\UU* — UqUqW < 4||it — U0II2, 



where the first inequality follows from the fact that uu* — UqUq is of rank at most 2, and the second 
follows from 



I UU — UqUq I 



sup 

||x||2 = l 

sup 

||X||2=1 



{u — uo,x){u + uo,x) 



< \\U — lto||2||« + W0II2 < 2||u — lto||2- 



Similarly, \\vv* — vqVqWi < 4e and this concludes the proof.^ 

Lemma 3.4 Let Z\ and Z2 be independent M{0, 1) variables and t G [0, 1]. We have 

E\Zl-tZl\ = f{t), 

where f{t) is given by (3.3). 



Proof Set 



p = and cos 6 = 

^ l + t ^ 



in which 9 e [0, 7r/2]. By using polar coordinates, we have 

-1 poo /'2tt 

E\Zf- tZl\ = — / r^e-"" dr \ cos^ (f) - t sin^ d(t) 
27r Jo Jo 

= — I cos^ — t sin^ ^1 dcf) 

TT Jo 

2 

= - I cos^ (f)-t sin^ 01 # 

71" Jo 

^The careful reader will remark that we have also used ||X — Xo|| < 5e, which also follows from our calculations. 



13 



Now using the identities cos^ cf) = {1 + cos2^)/2 and sin^ ^ = (1 — cos2<^)/2, we have 
E \Zl -tZl\ = / I cos 24) + p\ d4) 



l + t 


f'TT 

L 1 




u 


i + t 


/•27r 




/n 


l + t 


/"TT 




'0 ' 


l+t 


/"TT 




'o 


l + t r 


/ 


TT 


JO 



I COS 4> + p\d4> 
\p — cos^l 

COS 4> — pd4) + I p — COS 



= -(l + t)[sin^ + p(7r/2-0)]. 

TT 

We recognize (3.3). ■ 

4 Dual Certificates 

To prove our main theorem, it remains to show that one can construct an inexact dual certificate 
Y obeying the conditions of Lemma 2.1. 

4.1 Preliminaries 

The hnear mapping A* A is of the form^ 

m 

A* A = ^ZiZ* ® ZiZ*, 

1=1 

which is another way to express that A*A{X) = "^-{ziZ^ , X)ziZ^ . Now observe the simple identity: 

E[ziZ* ® ZiZ*] = 21 + In ^ In •■= S, (4.1) 

where I is the identity operator and the n-dimensional identity matrix. Put differently, this 
means that for all X, 

S{X) = 2X + Tt{X)I. 

The proof is a simple calculation and omitted. It is also not hard to see that the mapping S is 
invertible and its inverse is given by 

We will use this object in the definition of our dual certificate. 
®For symmetric matrices, A ® B is the linear mapping H i-> {A, H)B. 
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4.2 Construction 



For pedagogical reasons, we first introduce a possible candidate certificate defined by 

Y := -A*AS-\eiel). (4.2) 

m 

Clearly, Y is in the range of A* as required. To justify this choice, the law of large numbers gives 
that in the limit of infinitely many samples, 

lim - TiziZ* ® ZiZ*)S-\eiel) = ^{z,z\ ® Ziz\)S-\e^e\) = e^e\. 

n, — yoo m. ' * 



I 



In other words, in the limit of large samples, we have a perfect certificate since Yt = eie* and 
Yji- = 0. Our hope is that the sample average is sufficiently close to the population average so that 
one can check (2.4). In order to show that this is the case, it will be useful to think of Y (4.2) as 
the random sum 



m 



where each matrix Yi is an independent copy of the random matrix 

1 „ , 



Zi - 



^ n + 2' 



zz 



in which z = {zi, . . . , Zn) ~ A/^(0, /). 



We would like to make an important point before continuing. We have seen that all we need from 
Y is 

\\Yt - eie[\\2 < 1/3 

(and < 1/2). This is in stark contrast with David Gross' approach [13] which requires a very 

small misfit, i.e. an error of at most 1/n^. In turn, this loose bound has an enormous implication: 
it eliminates the need for the golfing scheme and allows for the simple certificate candidate (4.2). 
In fact, our certificate can be seen as the first iteration of Gross' golfing scheme. 

4.3 Truncation 

For technical reasons, it is easier to work with a truncated version of Y and our dual certificate is 
taken to be 

Y = -y2YilE„ (4.3) 
where the 1^'s are as before and Ig. are independent copies of 1^: with 

E = {\zi\ < V2/31ogn} n {||z||2 < Vs^}. 
We shall work with P = 3 so that \zi\ < \/ 6 log n. 
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Lemma 4.1 Let Y he as in (4.3). Then 

p(||1t - eielh > ^) < 2exp(-7^), (4.4) 

where j > is an absolute constant. This holds with the proviso that m> c\n for some numerical 
constant c\ > 0, and that n is sufficiently large. 

Lemma 4.2 Let Y he as in (4.3). Then 

P(l|l'/||>l)<4exp(-7j^), (4.5) 

where 'y > is an absolute constant. This holds with the proviso that m > cin log n for some 
numerical constant ci > 0, and that n is sufficiently large. 



4.4 Y on T and proof of Lemma 4.1 



It is obvious that for any symmetric matrix X £ T, 

\\X\\2 < V2||Xei||2 

since only the first row and column are nonzero. We have 



Irei -ei = — V^yjle Vlei '^e^, 

1=1 i=i 

where the y^'s are independent copies of the random vector 



1 



y 



1 



n + 2 



Izll^ 



ziz-ei := {^zi) z - e\. 



(4.6) 



(4.7) 



We claim that 



hE^ii^f <i/9, 



with probability at least 1 — 2e '^'^ for some 7 > 0. This is a simple application of Bernstein's 
inequality. Set 7r(/3) = ^{Ef) and observe that 



7r(/3) =P(|zi| > V2/?logn) +P(||z||^ > 3n) < n'^ ^ e'^ . 



(4.8) 



The right-hand side follows from P(|zi| > i) < e which holds for i > 1 and from P(||z||2 > 
3n) < e""/^. In turn, this last bound follows from 



P(||2||i-n 
Returning to Bernstein, this gives 



> \/2nt + t^) < e"*'/2. 



— Vlijc -7r(/3) >t)<2expf- 
1=1 



mt 



2Tr{P) + 2t/3 



Setting t = 1/18, /3 = 3 and taking n large enough so that 7r(3) < 1/18 proves the claim. 

The main task is to bound the 2-norm of the sum Vi^Ei and a convenient way to do this is 

via the vector Bernstein inequality. 
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Theorem 4.3 (Vector Bernstein inequality) Let Xi he a sequence of independent random vec- 
tors and set V > X^jlE ||a;i|||. Then for all t < F/max||a;i ||2, we have 

P(|| Y,{xi -W.Xi)\\2 >^fv + t)< e-*'/^^. 

i 

It is because this inequality requires bounded random vectors that we work with the truncation 
i=lVdEi- 

Put y = ylE- Since < we first compute E We have 



|y|li = INIIk¥-2^?^ + i, c = \ 



2 1 II ||2 



and a little bit of algebra yields 



Thus, 



|y|l2-4^ill^ll2 2(n + 2)^i"^"2 + 4(^^2)2^i"^"2 + ^ ^ ll^lb + 1- 



E [||y||i] = ^(15n + 90) - ^^-L_(3„2 ^ ^ 72) + |(^^(n + 4)(n + 6) - 1 



<4(n + 4), (4.9) 

where we have used the following identities 

E [zfll^lll] =n + 2, 
E [^f 11^112] ={n + 2)(n + 4)(n + 6), 
E[zt||2||^] =3n2 + 30n + 72, 
E [zf\\z\\l] = 15n + 90. 

Second, on the event of interest we have |^| < /31ogn (assuming 2j3\ogn > 3), \zi\ < ^/2 j3 log n and 
||z||2 < -\/3n and, therefore. 



||y||2 < V6n(/31ogn)^/2 + 1 < V^(/3 log n)^/^ 
provided n is large enough. 

Third, observe that by symmetry, all the entries of y but the first have mean zero. Hence, 



||Ey||2 = |Ej/i-yi| = |E 1bcj/i| < ^/¥{E^^Eyl 

We have 



and using the identities above 



2 _ 101 27n2 + 210n + 288 
^^^-^ 4(n + 2)2 -22' 
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which gives 

||Ey||2 < ^22(n-^ + e-t). 
Finally, with V = 4m(n+4), Bernstein's inequality gives that for each t < 4(n+4)/[\/7n(/31ogn)^/^], 



m 



with probability at most exp(— jg^^). It follows that 

||m-i^yi||2 > ^22(n-/3 + e-t) + 2^^+t 

i 

with at most the same probability. Our result follows by taking i = l/6, /3 = 3, m> cin where n 
and ci are sufficiently large such that 



/ a "X / n + 4 1 2 

22 n-/5 + e-3 +2^/ + - < -. 

m 6 9 



4.5 Y on T"*- and proof of Lemma 4.2 

We have 



m 



where the XiS are independent copies of the random matrix 



X = 



^1 - 



n + 2 



lzl|2 



(4.10) 



One natural way to bound the norm of this random sum is via the operator Bernstein's inequality. 
We develop a more customized approach, which gives sharper results. 

Decompose X as 



X 



1 



n + 2 



FII2 



Note that since zi and V^^izz'^) are independent, wc have EX^^^ = and thus, EX^-*-) = since 



EX = 0. With X^'^^ = X^^^lEi and similarly for X^^\ it then suffices to show that 



(4.11) 



l^xf) <m/4 and <m/4 

i i 

with large probability. Write the norm as 

ll^xf) =sup|^(u,Xru) 



where the supremum is over all unit vectors u that are orthogonal to ei. The strategy is now to 
find a bound on the right-hand side for each fixed u and apply a covering argument to control the 
supremum over the whole unit sphere. In order to do this, we shall make use of a classical large 
deviation result. 
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Theorem 4.4 (Bernstein inequality) Let{Xi} be a finite sequence of independent random vari- 
ables. Suppose that there exists V and c such that for all k > 3, 



^E\X,\^<h\Vc'^-\ 



Then for all t > 0, 



t' 

2V + 2cot 



(4.12) 



For the first sum in (4.11), we write 

i i 

where the Tjj's are independent copies of 



2 1 



zj-1 



The point of the decomposition X^'^^ + X^^^ is that zi and {z,u) are independent since u is 
orthogonal to ei. We have Er] = and for > 2, 



E \ri 1e\' < 2-''E\{zf - 1) l{,2<2^i„g„}|'= E \{z, u) 



\2k 



First, 



E \{zl - 1) l{,2<2^i„g„j|'= < {2p\ognf-''E{zl - \f = 2(2/3 log 71)^=- 2. 
Second, the moments of a chi-square variable with one degree of freedom are well known: 

E|(z,u)|2^ = 1 X 3 X ...X (2A; - 1) < 2^=^! 

Hence we can apply Bernstein inequality with V = Am and cq = 2/3 log n and, obtain 



We now note that 



which gives 



"(|^??ilEi-E[r7il£j| >mt^ < 2exp(- 

i 



m 



4 2 + /3ilogn 



)■ 



m t^ 
T2 + ^tlogra 



T[m-'\Y,mlE\>t + f-^)<2e^v{ 

i 

For instance, take t = 1/12, P = 3, m > cin and n large enough to get 

F{m-'\J2v^lE.\ > 1/8) < 2exp(-7j^) 
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To derive a bound about we use (see Lemma 4 in [21]) 



sup 

u 



< 2 sup 
ueMi/4 



where A/1/4 is a 1/4-net of the unit sphere {u : \\u\\2 = l,u ± ei}. Since IA/1/4I < 9", 
P(m-^||X(°)|| > 1/4) < Pfm-i sup {u,X^^'>u) > l/s) <Tx 2expf- 



logn 



We deal with the second term in a similar way, and write 

^(n,X«n)=^r?,l^,, 



where the r\i?> are now independent copies of 

1 



1 



n + 2 



On ||z||2 < 3n and, therefore, E I?? 1b|*^ < l^kX. We can apply Bernstein's inequality with cq = 2 
and V = 8m, which gives 



'd^r^ilE, -]E[?7,li5j| >mt) < 2exp( 



m t 



4 4 + t. 

The remainder of the proof is identical to that above and is therefore omitted. 



4.6 Proof of Theorem 1.1 

We now assemble the various intermediate results to establish Theorem 1.1. As pointed out. 
Theorem 1.1 follows immediately from Lemma 2.1, which in turn hinges on the validity of the 
conditions stated in (2.2), (2.3), and (2.4). 

Lemma 3.1 asserts that condition (2.2) holds with probability of failure at most pi, where pi = 
2g-7i™ aj^fj here and below, 71,..., 74 are positive numerical constants. Similarly, Lemma 3.2 
shows that condition (2.3) holds with probability of failure at most p2, where p2 = 3e~'^^'^. In both 
cases we need that m> cn for an absolute constant c > 0. 

Proceeding to the dual certificate in (2.4), we note that Lemma 4.1 establishes the first part of the 
dual certificate with a probability of failure at most pa, where p^ = 3e~'''3"*/". The second part of 
the dual certificate in (2.4) is shown in Lemma 4.2 to hold with probability of failure at most p4, 
where p4 = 4e ^* i°g " . In the former case we need m > cn for an absolute constant c > and in 

the latter m > c'n logn. 

Finally, the union bound gives that under the hypotheses of Theorem 1.1, exact recovery holds 
with probability at least 1 — Se"'''™/" for some 7 > 0, as claimed. 
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5 The Complex Model 



This section proves that Theorem 1.1 holds for the complex model as well. Not surprisingly, the 
main steps of the proof are the same as in the real case, but there are here and there some noteworthy 
differences. Instead of deriving the whole proof, we will carefully indicate the nontrivial changes 
that need to be carried out. 

First, we can work with x = ei because of rotational invariance, and with independent complex 
valued Gaussian sequences Zi ~ CJ\f{0, 1 ,0). This means that the real and imaginary parts of Zi 
arc independent white noise sequences with variance 1/2. 

The key Lemma 2.1 only requires a slight adjustment in the numerical constants. The reason for 
this is that while Lemma 3.1 does not require any modification. Lemma 3.2 changes slightly; in 
particular, the numerical constants are somewhat different. Here is the properly adjusted complex 
version. 

Lemma 5.1 Fix S > 0. Then there are positive numerical constants cq and 70 such that if 
m > Co [S"^ log S~^] n, A has the following property with probability at least 1 — Se~'^°^^ : for 
any Hermitian rank- 2 matrix X, 

^\\A{X)\\i>2{V2-l){l-d)\\X\\ > 0.828(1 (5.1) 

The proof of this lemma follows essentially the proof of Lemma 3.2. The function f{t) (cf. equa- 
tion (3.3)) now takes the form 

= = ITT' ^^-^^ 

where ^ = ll^ip— t|Z2p|, with Zi and Z2 independent CAA(0, 1, 0), as demonstrated in the following 
lemma. 

Lemma 5.2 Let Z\ and Zi be independent CM{0, 1,0) variables and t G [0, 1]. We have 

E\\Z^\^-t\Z2\^\ = f{t), 

where f{t) is given by (5.2). 



Proof Set ^ ^ 

p = and cos 9 = p 

^ 1 + t ^ 

in which 9 G [0,7r/2]. By using polar coordinates for the variables {xi,yi) associated with Zi and 
(3^2,2/2), associated with Z2 we have 

1 /-oo /-oo 

E||Zi|2 -t|Z2|2| = - / / \rj-trl\rir2e-''^^^e-''^^'^dndr2 

2 Jo Jo 

= - / r^e~^^'^dr / | sin^cos^H cos^ — tsin^ ^| d^, 
8 Jo Jo 
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0.4 0.6 



Figure 3: The function f{t) in (5.2) as a function of t. 



where we used polar coordinates again in variables (ri,r2). Now using the identities cos^ i 
(1 + cos 20) /2, sin^ = (1 — cos 20) /2 and 2 sin cos = sin 20 we have 

1 

E\Zf -tZ^\ = - / |sin20||cos20 + /)|d0 

r f'^ f'^ 

/ sin 0(cos (f> — p) dcj) + / sin 0(p — cos 0) (i0 



1 r 
2 

^{l + t)[-^cos29 + 2pcose + ^] 
l{l + t)[p' + l] 



l + t 

as claimed. ■ 

The graph of f{t) is shown in Figure 3. The minimum of this function on [0, 1] is 2(^/2 — 1) > 0.828. 
Furthermore, the covering argument in that proof has to be adapted; for example, unit spheres need 
to be replaced by complex unit spheres. 

A consequence of this change in numerical values is that the numerical factors in Lemma 2.2 need 
to be adjusted. 

Lemma 5.3 Any feasible matrix H such that Tr{H) < must obey 

\\Ht\\2 < II-^tII- 

Finally, with all of this in place, Lemma 2.1 becomes this: 
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Lemma 5.4 Suppose that the mapping A obeys the following two properties: for some S < 3/13; 

1) for all positive semidefinite matrices X, 

m-i||^(X)||i<(l + ,5)||X||i; (5.3) 

2) for all matrices X £ T 

m-^\\A{X)\\i>2{V2-l){l-5)\\X\\ > 0.828(1 (5.4) 
Suppose further that there exists Y in the range of A* obeying 

||1t - eiel||2 < 1/5 and 11^^11 < 1/2. (5.5) 
Then eie* is the unique minimizer to (1.5). 

We now turn our attention to the properties of the dual certificate we studied in Section 4. The 

first difference is that the expectation of A* A in (4.1) is different in the complex case. A simple 
calculation yields 

E —A* A = T + In®In--S. 

m 

This means that for all -X", 

S{X) = X + Tr{X)I. (5.6) 

We note that in this case 

S'^ =I-^—In®In ^ S-\X)=X--^TT{X)In. (5.7) 

n + 1 n + 1 

We of course use this new in the complex analog of the candidate certificate (4.3). A conse- 
quence is that in the proof of Lemma 4.1, for instance, (4.7) now takes the form 



X 



I |2 -'■II ||2 
ITT P 2 

n + 1 



zi z - ei := {^zi) z - ei. (5.8) 



To bound the 2-norm of a sum of i.i.d. such random variables (as in Lemma 4.1), we employ the 
same Bernstein inequality for real vectors, using the fact that ||z||2 = ||(3?(z), SS(z))||2 for any 
complex vector z. Similarly (4.10) becomes 



X 



n + 1' 



Vt^{zz*). (5.9) 



To bound the operator norm of a sum of i.i.d. such random matrices (as in Lemma 4.2), we again use 
a covering argument, this time working with chi-square variables with two degrees of freedom, since 
1(2, w)p is distributed as |x^(2)- Since \{z, m)P are real random variables, we use the same version 
of the Bernstein inequality as in the real- valued case. The only difference is that the moments are 
now 

E|(z,u)|2'= = 2-*^ X (2 + 0) X (2 + 2) X (2 + 4) X ... X (2-F2fc-2) = fc! 
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6 Stability 



This section proves the stability of our approach, namely, Theorem 1.2. Our proof parallels the 
argument of Candes and Plan for showing the stability of matrix completion [8] as well as that of 
Gross et al. in [14]. 

Just as before, we prove the theorem in the real case since the complex case is essentially the same. 
Further, we may still take x = ei without loss of generality. We shall prove stability when the Zj's 
are i.i.d. J\f{0, In) and later explain how one can easily transfer a result for Gaussian vectors to a 
result for vectors sampled on the sphere. Under the assumptions of the theorem, the RIP-l-like 
properties, namely, Lemmas 3.1 and 3.2 hold with a numerical constant Si we shall specify later. 
Under the same hypotheses, the dual certificate Y (4.2) obeys 

\\VT{Y-e,el)\\2<7, \\Yr4<l, 
in which 7 is a numerical constant also specified later. 

Set X = XX* = eie* and write X = X + H. We begin by recording two useful properties. First, 
since X is feasible for our optimization problem, we have 

Tr(X + H)< Tr{X) ^ Tr{H) < 0. (6.1) 

Second, the triangle inequality gives 

\\AiH)\\2 = \\A{X - X)h < \\AiX) - b\\2 + ||6 - ^(X)||2 < 2e. (6.2) 

In the noiseless case, A{H) = =^ {H, Y) = 0, by construction. In the noisy case, a third 
property is that \{H, Y)\ is at most on the order of e. Indeed, 

m\{H,Y)\ = \{AiH),AS-\X))\ < \\AiH)\U\AS-\X)\\,. 

Since, \\A{H)\\^ < \\A{H)\\2 and 

\\AS-\X)\\i < m(l + di)\\S-\X)\\i < m(l + <5i), 

we obtain 

KJy^,F)|<26(l + ,5i). (6.3) 
We now reproduce the steps of the proof of Lemma 2.1, and obtain 

> Tr(ffT) + Tr(ff^) > ^ Tr{H^) - jWHrh - Y)\, 

which gives 

Tr{H^) < 4e(l + Si) + 27||jyT||2 < 4e(l + Si) + 2V2j\\Ht\\, (6.4) 
where we recall that Ht has rank at most 2. We also have 

0.94(1 - Si)\\Ht\\ < m-^\\A{HT)\\i < m-^\\AiH)\\i + m-'\\AiH^)\\i 

< m-'/^\\A{H)\\2 + (1 + Si) Tv{H^) (6.5) 
<2m-^/h+{l + Si)TT{HT±), (6.6) 
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where the second incquahty follows from the RIP-1 property together with the Cauchy-Schwarz 
inequality. Plugging this last bound into (6.4) gives 

Tr{H^) < 4e(l + (5i +7am-V2) ^ p^Tr{H^), 

where 

"= 0.94(f-^,) ' ^ = 2"(^ + ^^)- 

Hence, when ^7 < 1, we have 

1 - P7 



In addition, (6.6) then gives 



l^^ll^ 0.94(1-5,) ^ = ^^" 



In conclusion, 

||-H"||2 < ||-H't||2 + ll-f^Tlh < V2||1/t|| + ll^^rlli < {V2c2 + ci)e = coe, 
and we also have Hii"!! < (c2 + ci)e. 

It remains to show why the fact that X is close to X = xx* in the Frobenius or operator norm 
produces a good estimate of x (recall that x = ei). Set eo := ||-X^ — -X^|| < cq e. Below, Ai > is 
the largest eigenvalue of X ^ 0, and iii the first eigenvector. Likewise, Ai = 1 is the top eigenvalue 
of X = eiel. Since lh:{X) < Tr{X), 

Ai < Ai. 

In the other direction, we know from perturbation theory that 

|Ai — All < \\X — X|| = eo. 
Assuming that eg < 1, this gives Ai G [1 — cq, 1]. The sin-^-Theorem [11] implies that 

I smfc*! < < 



|Ai| 1-eo' 

where 0<^<7r/2is the angle between the spaces spanned by wi and ei. Writing 

ui = cos 9ei + sin 9ej^ 
in which ej^ is a unit vector orthogonal to ei, Pythagoras' relationship gives 

II ei — y'^wi||2 = (1 — cos0)^ + Ai sin^ 9. 
Since cos 9 = -\/l — sin^ 9, we have 



1 > \/Aicos6l > Wl - eo - — > 1 - eo 
V 1-eo 
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for Co < 1/3. Hence, 



provided eo < 1/3. Since we always have 

||ei - \Ai'Ui||2 < ||ei||2 + Ai||'Ui||2 < 2, 

we have established 

||ei - \[^iui\\2 < Comin(e, 1). 

This holds for all values of eo and proves the claim in the case where ||cc||2 = 1. The general case 
is obtained via a simple rescaling. 

As mentioned above, we proved the theorem for Gaussian Zj's but it is clear that our results hold 
true for vectors sampled uniformly at random on the sphere of radius i/n. The reason is that 
of course, ||2;i||2 deviates very little from ^/n. Formally, set Zi = [^/n/\\zi\\2]zi so that these new 
vectors are independently and uniformly distributed on the sphere of radius ^/n. Then 

Tl 

{X,ZiZj^) = - {X,ZiZj^), 

\\^i\\2 

and thus {X, ZiZ*) is between (1 — 82) {X, ZiZ*) and (1 + ^2) {X, ZiZ*) with very high probability. 
This holds uniformly over all Hermitian matrices. Thus if A{X) = {i|Xij}i<j<^, 

(1 - 52)\\A{X)\U < \\A{X)\\, < (1 + 52)\\A{X)\U 

for any 1 < q < 00. 

Now take bi = \ {x, Zi)\^ + vi and solve (1.9) to get X = X + H. Going through the same steps as 
above by using the relationships between A and A throughout, and by using the dual certificate Y 
associated with A, we obtain 

\\A{H)h < 2e, \{H, Y)\ < 2e(l + 6i){l + 62), 

and 

Tt{HT±) < (1 + 52)cie, ll^rll < (1 + '^2)c2e. 

Therefore, 

||-H"||2 < (l + (52)(V2c2 + ci)e. 
The rest of the proof goes through just the same. 

7 Numerical Simulations 

In this section we illustrate our theoretical results with numerical simulations. In particular, we 
will demonstrate PhaseLift's robustness vis a vis additive noise. 
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We consider the setup in Section 1.4, where the measurements arc contaminated with additive 
noise. The solution to (1-9) is computed using the following regularized nuclear- norm minimization 
problem: 

minimize ^\\A{X) - b\\l + X\\X\\i. (7.1) 

It follows from standard optimization theory [19] that (7.1) is equivalent to (1.9) for some value of 
A. Hence, we use (7.1) to compute the solution of (1.9) by determining via a simple and efficient 
bisection search the largest value A(e) such that ||^(-X') — 6||2 < £■ The numerical algorithm to 
solve (7.1) was implemented in Matlab using TFOCS [5]. We then extract the largest rank-1 
component as described in Section 1.4 to obtain an approximation x. 

We will use the relative mean squared error (MSE) and the relative root mean squared error (RMS) 
to measure performance. However, since a solution is only unique up to global phase, it does not 
make sense to compute the distance between x and its approximation x. Instead we compute the 
distance modulo a global phase term and define the relative MSE between x and x as 

II " l|2 

, CX tC 2 

mm — - — — . 

c:\c\=l \\X\\^ 

The (relative) RMS is just the square root of the (relative) MSE. 

In the first set of experiments, we investigate how the reconstruction algorithm performs as the 
noise level increases. The test signal is a complex-valued signal of length n = 128 with independent 
Gaussian complex entries (each entry is of the form a + ib where a and b are independent M{0, 1) 
variables) so that the real and imaginary parts are independent white noise sequences. Obviously, 
the signal is arbitrary. We use m = 6n measurement vectors sampled independently on the unit 
sphere C^. 

We generate noisy data from both a Gaussian model and a Poisson model. In the Gaussian model, 
bi ~ AA(/Xi,(j^) where jii = \{x,Zi)\'^ and a is adjusted so that the total noise power is bounded 
by e^. In the Poisson model, bi ~ Poi(^i) and the noise bi — jii is rescaled to achieve a desired 
total power as above (we might do without this rescaling as well but have decided to work with a 
prescribed signal-to- noise ratio SNR for simplicity of exposition). We do this for five different SNR 
levels,^ ranging from 5dB to lOOdB. However, we point out that we do not make use of the noise 
statistics in our reconstruction algorithm^, since our purpose is only to assume an upper bound on 
the total noise power, as in Theorem 1.2. 

For each SNR level, we repeat the experiment ten times with different noise terms, different signals, 

and different random measurement vectors; we then record the average relative RMS over these ten 
experiments. Figure 4(a) shows the average relative MSE in dB (the values of 10 log]^Q(rel. MSE) are 
plotted) versus the SNR for Poisson noise. In each case, the performance degrades very gracefully 
with decreasing SNR, as predicted by Theorem 1.2. Debiasing as described at the end of Section 1.4 
leads to a further improvement in the reconstruction for low SNR, as illustrated in Figure 4(b). 
The results for Gaussian noise are comparable, see Figure 5. 

In the next experiment, we collect Poisson data about a complex-valued random signal just as 
above, and work with a fixed SNR set to 15dB. The number of measurements varies so that the 

'^Thc SNR of two signals x, x with respect to x is defined as lOlogjQ ||a;|j2/||a; — ijli- So we say tliat tlic SNR is 
lOdB iff01ogio||cc||^/||i/||i = 10. 

®We refer to [7] for efficient ways to incorporate statistical noise models into the reconstruction algorithm. 
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SNR (dB) SNR (dB) 

(a) (b) 



Figure 4: Performance of PhaseLift for Poisson noise. The stability of the algorithm is 
apparent as its performance degrades gracefully with decreasing SNR. (a) Relative MSE on 
a log-scale for the non-debiased recovery, (b) Relative RMS for the original and debiased 
recovery. 




Figure 5: Performance of PhaseLift for Gaussian noise, (a) Relative MSE on a log-scale for 
the non-debiased recovery, (b) Relative RMS for the original and the debiased recovery. 
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Figure 6: Oversampling rate versus relative RMS. 



oversampling rate m/n is between 5 and 22 (m is thus between nlogn and 4.5relogn). We repeat 
the experiment ten times with different noise terms and different random measurement vectors 
for each oversampling rate; we then record the average relative RMS. Figure 6 shows the average 
relative RMS of the solution to (1.5) versus the oversampling rate. We observe that the decrease in 
the RMS is inversely proportional to the number of measurements. For instance, the error reduces 
by a factor of two when we double the number of measurements. If instead we hold the standard 
deviation of the errors at a constant level, the mean squared error (MSE) reduces by a factor of 
about two when we double the number of measurements. 

8 Discussion 

In this paper, we have shown that it is possible to recover a signal exactly (up to a global phase 
factor) from the knowledge of the magnitude of its inner products with a family of sensing vectors 
{zi}. The fact that on the order of n log n magnitude measurements \ {x, Zi)\'^ uniquely determine x 
is not surprising. The part we find unexpected, however, is that what appears to be a combinatorial 
problem is solved exactly by a convex program. Further, we have established the existence of a 
noise-aware recovery procedure — also based on a tractable convex program — which is robust vis a 
vis additive noise. To the best of our knowledge, there are no other results — about the recovery of 
an arbitrary signal from noisy quadratic data — of this kind. 

An appealing research direction is to study the recovery of a signal from other types of intensity 
measurements, and consider other families of sensing vectors. In particular, structured random 
families would be of great interest. It also seems plausible that assuming stochastic errors in 
Theorem 1.2 would allow to derive sharper error bounds; it would be of interest to know if this is 
indeed the case. We leave this to future work. 
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